fix: import of w:sdt nested in paragraph (SD-2190)#2380
fix: import of w:sdt nested in paragraph (SD-2190)#2380luccas-harbour wants to merge 9 commits intomainfrom
Conversation
Allow paragraph translators to return fragment arrays during import and normalize them in the v2 paragraph importer. When the legacy paragraph handler returns mixed fragment output, apply encoded paragraph attributes only to paragraph nodes so embedded documentPartObject fragments remain unchanged. Add coverage for the array-return path in the paragraph translator tests.
|
Status: FAIL One spec violation in the export path, plus a minor default value issue. 1. The new <w:p w:rsidRDefault="...">
<w:pPr>...</w:pPr>
<w:sdt>
<w:sdtContent>
<w:p>...</w:p> <!-- paragraph child! -->
</w:sdtContent>
</w:sdt>
</w:p>Per §17.5.2.31, a This is clearly intentional — the PR is round-tripping documents that already had this unusual structure — but the export output is technically non-conformant OOXML. If Word generated the original file, it probably placed the 2. docPartUnique: {
default: true, // pre-existing, not introduced here
},This is pre-existing and not changed by this PR, but it interacts directly with the new Everything else checks out — |
|
About the OOXML Spec Review failure above, the docx file that originated this PR has exactly the structure that the report is saying isn't valid but was created with Word: <w:p w14:paraId="41964671" w14:textId="04598795" w:rsidR="00233D7B" w:rsidRPr="003104CE" w:rsidRDefault="00233D7B" w:rsidP="003104CE">
<w:sdt>
<w:sdtPr>
<w:id w:val="123456789"/>
<w:docPartObj>
<w:docPartGallery w:val="Table of Figures"/>
<w:docPartUnique/>
</w:docPartObj>
</w:sdtPr>
<w:sdtContent>
<w:p w14:paraId="11111111" w14:textId="11111111">
<w:r>
<w:t>Table of Figures</w:t>
</w:r>
</w:p>
<w:p w14:paraId="22222222" w14:textId="22222222">
<w:r>
<w:t>Figure 1</w:t>
</w:r>
<w:r>
<w:tab/>
</w:r>
<w:r>
<w:t>1</w:t>
</w:r>
</w:p>
</w:sdtContent>
</w:sdt>
</w:p>Therefore I don't think the analysis result is valid. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2772dfcb84
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
...per-editor/src/core/super-converter/v3/handlers/w/sdt/helpers/translate-document-part-obj.js
Outdated
Show resolved
Hide resolved
|
Status: PASS The OOXML handling in this PR is spec-compliant. Here's what I checked:
Paragraph rsid attributes ( The wrapping behavior ( No non-existent attributes, no incorrect defaults, and no new spec violations introduced. The |
Summary
Fix DOCX import failures caused by
w:sdtw:docPartObjnodes being parsed as block content insideparagraphnodes.This change makes paragraph import resilient when a block-level docPart SDT appears inside
w:p, especially for cases like Table of Figures / Table of Contents placeholders at the start of a document.Problem
Some documents contain XML like:
Our importer translated that
w:sdtinto adocumentPartObjectblock node, but still attempted to keep it insideparagraph.content.Since
paragraphonly allows inline content, import could fail with:What changed
isInlineNode()helper for schema-aware inline/block classification.w:pinstead of being inserted intoparagraph.content.paragraphfragments for inline runsdocumentPartObjectparaId/textIdstay on only one paragraph fragmentrsid*attrs are still copied to paragraph fragmentsdocumentPartObjectnodes so paragraph formatting can round-trip through export.w:p > w:sdtwhen wrapper paragraph metadata is present.sectPrplacement for hoisted trailing/block-only docPart cases.Tests
Added and updated regressions for:
docPartObjinsidew:pdocPartObjLimitations
The following limitations are known and accepted for now:
documentPartObjectis primarily used for round-trip/export and is not fully reflected in rendering behavior.paragraphfragments in edge cases.